Skip to content

Refactor range/gen_series signature away from user defined#18317

Merged
alamb merged 4 commits intoapache:mainfrom
Jefffrey:gen-series-signature
Oct 30, 2025
Merged

Refactor range/gen_series signature away from user defined#18317
alamb merged 4 commits intoapache:mainfrom
Jefffrey:gen-series-signature

Conversation

@Jefffrey
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Trying to move away from user defined signatures where possible; mainly to ensure consistency of error checking/messages. The original issue is because the function has to do this checking itself leading to inconsistency of error used (ideally shouldn't be internal). By uplifting away from a user defined signature we can make use of existing code meant to handle this checking and error messages for us.

What changes are included in this PR?

Defined range/generate_series signature via coercible API instead of being user defined. Some accompanying changes are needed in the signature code to make this possible.

Are these changes tested?

Added SLT tests and fixed any existing ones.

Are there any user-facing changes?

No (error messages do change though)

@github-actions github-actions Bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Oct 28, 2025
TypeSignatureClass::Binary if native_type.is_binary() => {
Ok(origin_type.to_owned())
}
_ if native_type.is_null() => Ok(origin_type.to_owned()),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were missing this even though we check for it in matches_native_type; I think this will allow passing null types to coercible signatures without needing to explicitly specify null type handling

}

impl Range {
fn defined_signature() -> Signature {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change

Comment on lines +368 to +373
// Signature can only guarantee we get a date type, not specifically
// date32 so handle potential cast from date64 here.
let start = cast(start, &DataType::Date32)?;
let start = as_date32_array(&start)?;
let stop = cast(stop, &DataType::Date32)?;
let stop = as_date32_array(&stop)?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much benefit in handling date64 natively anyway (given it's a bit of a weird type) so I think this is fine

Comment on lines +425 to +440
// Signature can only guarantee we get a timestamp type, not specifically
// timestamp(ns) so handle potential cast from other timestamps here.
fn cast_to_ns(arr: &ArrayRef) -> Result<ArrayRef> {
match arr.data_type() {
DataType::Timestamp(TimeUnit::Nanosecond, _) => Ok(Arc::clone(arr)),
DataType::Timestamp(_, tz) => Ok(cast(
arr,
&DataType::Timestamp(TimeUnit::Nanosecond, tz.to_owned()),
)?),
_ => unreachable!(),
}
}
let start = cast_to_ns(start)?;
let start = as_timestamp_nanosecond_array(&start)?;
let stop = cast_to_ns(stop)?;
let stop = as_timestamp_nanosecond_array(&stop)?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if is better to instead try uplift the signature coercible API to allow specifying we only want timestamp(ns) (of any timezone) and let that handle coercion for us 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be nicer but I think it could also be done as a follow on PR

Comment on lines +6941 to +6947
range(arrow_cast(1, 'Int8'), 5, -1),
range(arrow_cast(1, 'Int16'), arrow_cast(-5, 'Int8'), 1),
range(arrow_cast(1, 'Int32'), arrow_cast(-5, 'Int16'), arrow_cast(-1, 'Int8')),
range(DATE '1992-09-01', DATE '1993-03-01', arrow_cast('1 MONTH', 'Interval(YearMonth)')),
range(DATE '1993-02-01', arrow_cast(DATE '1993-01-01', 'Date64'), INTERVAL '-1' DAY),
range(arrow_cast(DATE '1989-04-01', 'Date64'), DATE '1993-03-01', INTERVAL '1' YEAR),
range(arrow_cast(DATE '1993-03-01', 'Date64'), arrow_cast(DATE '1989-04-01', 'Date64'), INTERVAL '1' YEAR)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uplift tests to ensure coercion is handled properly for different types

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making this more explicit by:

  1. Revert the change to the existing tests
  2. Add a new query right below this one that adds the casts to the other integer types

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Comment on lines -7134 to +7156
query error DataFusion error: Internal error: Unexpected argument type for generate_series : Date32
query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series(arrow_cast('2021-01-01T00:00:00', 'Timestamp(Nanosecond, Some("-05:00"))'), DATE '2021-01-02', INTERVAL '1' HOUR);

## mixing types not allowed even if an argument is null
query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series(TIMESTAMP '1992-09-01', DATE '1993-03-01', NULL);

query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series(1, '2024-01-01', '2025-01-02');

query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series('2024-01-01'::timestamp, '2025-01-02', interval '1 day');
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these are the original examples from the issue; technically we still return an internal error, so it feels a bit wrong to claim we close #15881 (well its an internal error nested in a plan error apparently)

However this internal error comes from the function signature related code, so we can fix it there and it should propagate to here (aka its not a specific issue with generate_series)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree we should fix it in the signature related code

I think it would also be ideal if we can provide more details on what went wrong and how we can fix it (aka as a user who doesn't understand the internal implementation, if I get this error message, how do I fix it?)

@Jefffrey Jefffrey marked this pull request as ready for review October 28, 2025 08:27
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jefffrey -- I will try and make it through this PR later today

};
}

macro_rules! singleton_variant {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you potentially add some documentation here about what is different about this macro compared to the others in this module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jefffrey -- this looks like an improvement to me -- I had a few suggestions and I think we should try and improve the messages even more, but we can do that as a follow on PR


// Signature can only guarantee we get a timestamp type, not specifically
// timestamp(ns) so handle potential cast from other timestamps here.
fn cast_to_ns(arr: &ArrayRef) -> Result<ArrayRef> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given cast already checks for the same type, I wonder if this is necessary? Or is the idea to cast to nanoseconds and maintain the timezone?

Copy link
Copy Markdown
Contributor Author

@Jefffrey Jefffrey Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's to cast from potentially other timestamp time units down to nanoseconds

Comment on lines +425 to +440
// Signature can only guarantee we get a timestamp type, not specifically
// timestamp(ns) so handle potential cast from other timestamps here.
fn cast_to_ns(arr: &ArrayRef) -> Result<ArrayRef> {
match arr.data_type() {
DataType::Timestamp(TimeUnit::Nanosecond, _) => Ok(Arc::clone(arr)),
DataType::Timestamp(_, tz) => Ok(cast(
arr,
&DataType::Timestamp(TimeUnit::Nanosecond, tz.to_owned()),
)?),
_ => unreachable!(),
}
}
let start = cast_to_ns(start)?;
let start = as_timestamp_nanosecond_array(&start)?;
let stop = cast_to_ns(stop)?;
let stop = as_timestamp_nanosecond_array(&stop)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be nicer but I think it could also be done as a follow on PR

Comment on lines +6941 to +6947
range(arrow_cast(1, 'Int8'), 5, -1),
range(arrow_cast(1, 'Int16'), arrow_cast(-5, 'Int8'), 1),
range(arrow_cast(1, 'Int32'), arrow_cast(-5, 'Int16'), arrow_cast(-1, 'Int8')),
range(DATE '1992-09-01', DATE '1993-03-01', arrow_cast('1 MONTH', 'Interval(YearMonth)')),
range(DATE '1993-02-01', arrow_cast(DATE '1993-01-01', 'Date64'), INTERVAL '-1' DAY),
range(arrow_cast(DATE '1989-04-01', 'Date64'), DATE '1993-03-01', INTERVAL '1' YEAR),
range(arrow_cast(DATE '1993-03-01', 'Date64'), arrow_cast(DATE '1989-04-01', 'Date64'), INTERVAL '1' YEAR)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making this more explicit by:

  1. Revert the change to the existing tests
  2. Add a new query right below this one that adds the casts to the other integer types

Comment on lines -7134 to +7156
query error DataFusion error: Internal error: Unexpected argument type for generate_series : Date32
query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series(arrow_cast('2021-01-01T00:00:00', 'Timestamp(Nanosecond, Some("-05:00"))'), DATE '2021-01-02', INTERVAL '1' HOUR);

## mixing types not allowed even if an argument is null
query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series(TIMESTAMP '1992-09-01', DATE '1993-03-01', NULL);

query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series(1, '2024-01-01', '2025-01-02');

query error DataFusion error: Error during planning: Internal error: Function 'generate_series' failed to match any signature
select generate_series('2024-01-01'::timestamp, '2025-01-02', interval '1 day');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree we should fix it in the signature related code

I think it would also be ideal if we can provide more details on what went wrong and how we can fix it (aka as a user who doesn't understand the internal implementation, if I get this error message, how do I fix it?)

@alamb alamb added this pull request to the merge queue Oct 30, 2025
Merged via the queue into apache:main with commit 67550f1 Oct 30, 2025
32 checks passed
@Jefffrey Jefffrey deleted the gen-series-signature branch October 31, 2025 02:16
tschwarzinger pushed a commit to tschwarzinger/datafusion that referenced this pull request Nov 2, 2025
…e#18317)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#15881
  - See my notes below

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Trying to move away from user defined signatures where possible; mainly
to ensure consistency of error checking/messages. The original issue is
because the function has to do this checking itself leading to
inconsistency of error used (ideally shouldn't be internal). By
uplifting away from a user defined signature we can make use of existing
code meant to handle this checking and error messages for us.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Defined range/generate_series signature via coercible API instead of
being user defined. Some accompanying changes are needed in the
signature code to make this possible.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added SLT tests and fixed any existing ones.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No (error messages do change though)

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
…e#18317)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#15881
  - See my notes below

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Trying to move away from user defined signatures where possible; mainly
to ensure consistency of error checking/messages. The original issue is
because the function has to do this checking itself leading to
inconsistency of error used (ideally shouldn't be internal). By
uplifting away from a user defined signature we can make use of existing
code meant to handle this checking and error messages for us.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Defined range/generate_series signature via coercible API instead of
being user defined. Some accompanying changes are needed in the
signature code to make this possible.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added SLT tests and fixed any existing ones.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No (error messages do change though)

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Internal error with generate_series: Internal error: could not cast array of type Date32

2 participants